11 research outputs found
Robust Sensor Fusion for Indoor Wireless Localization
Location knowledge in indoor environment using Indoor Positioning Systems
(IPS) has become very useful and popular in recent years. Indoor wireless
localization suffers from severe multi-path fading and non-line-of-sight
conditions. This paper presents a novel indoor localization framework based on
sensor fusion of Zigbee Wireless Sensor Networks (WSN) using Received Signal
Strength (RSS). The unknown position is equipped with two or more mobile nodes.
The range between two mobile nodes is fixed as priori. The attitude (roll,
pitch, and yaw) of the mobile node are measured by inertial sensors (ISs). Then
the angle and the range between any two nodes can be obtained, and thus the
path between the two nodes can be modeled as a curve. Through an efficient
cooperation between two or more mobile nodes, this framework effectively
exploits the RSS techniques. This constraint help improve the positioning
accuracy. Theoretical analysis on localization distortion and Monte Carlo
simulations shows that the proposed cooperative strategy of multiple nodes with
extended Kalman filter (EKF) achieves significantly higher positioning accuracy
than the existing systems, especially in heavily obstructed scenarios
Quaternion MLP Neural Networks Based on the Maximum Correntropy Criterion
We propose a gradient ascent algorithm for quaternion multilayer perceptron
(MLP) networks based on the cost function of the maximum correntropy criterion
(MCC). In the algorithm, we use the split quaternion activation function based
on the generalized Hamilton-real quaternion gradient. By introducing a new
quaternion operator, we first rewrite the early quaternion single layer
perceptron algorithm. Secondly, we propose a gradient descent algorithm for
quaternion multilayer perceptron based on the cost function of the mean square
error (MSE). Finally, the MSE algorithm is extended to the MCC algorithm.
Simulations show the feasibility of the proposed method
Variational Bayesian Approximations Kalman Filter Based on Threshold Judgment
The estimation of non-Gaussian measurement noise models is a significant
challenge across various fields. In practical applications, it often faces
challenges due to the large number of parameters and high computational
complexity. This paper proposes a threshold-based Kalman filtering approach for
online estimation of noise parameters in non-Gaussian measurement noise models.
This method uses a certain amount of sample data to infer the variance
threshold of observation parameters and employs variational Bayesian estimation
to obtain corresponding noise variance estimates, enabling subsequent
iterations of the Kalman filtering algorithm. Finally, we evaluate the
performance of this algorithm through simulation experiments, demonstrating its
accurate and effective estimation of state and noise parameters.Comment: 5 pages, conferenc
Prototypical Residual Networks for Anomaly Detection and Localization
Anomaly detection and localization are widely used in industrial
manufacturing for its efficiency and effectiveness. Anomalies are rare and hard
to collect and supervised models easily over-fit to these seen anomalies with a
handful of abnormal samples, producing unsatisfactory performance. On the other
hand, anomalies are typically subtle, hard to discern, and of various
appearance, making it difficult to detect anomalies and let alone locate
anomalous regions. To address these issues, we propose a framework called
Prototypical Residual Network (PRN), which learns feature residuals of varying
scales and sizes between anomalous and normal patterns to accurately
reconstruct the segmentation maps of anomalous regions. PRN mainly consists of
two parts: multi-scale prototypes that explicitly represent the residual
features of anomalies to normal patterns; a multisize self-attention mechanism
that enables variable-sized anomalous feature learning. Besides, we present a
variety of anomaly generation strategies that consider both seen and unseen
appearance variance to enlarge and diversify anomalies. Extensive experiments
on the challenging and widely used MVTec AD benchmark show that PRN outperforms
current state-of-the-art unsupervised and supervised methods. We further report
SOTA results on three additional datasets to demonstrate the effectiveness and
generalizability of PRN.Comment: Accepted by CVPR 202
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models
Latent Diffusion Models (LDMs) are renowned for their powerful capabilities
in image and video synthesis. Yet, video editing methods suffer from
insufficient pre-training data or video-by-video re-training cost. In
addressing this gap, we propose FLDM (Fused Latent Diffusion Model), a
training-free framework to achieve text-guided video editing by applying
off-the-shelf image editing methods in video LDMs. Specifically, FLDM fuses
latents from an image LDM and an video LDM during the denoising process. In
this way, temporal consistency can be kept with video LDM while high-fidelity
from the image LDM can also be exploited. Meanwhile, FLDM possesses high
flexibility since both image LDM and video LDM can be replaced so advanced
image editing methods such as InstructPix2Pix and ControlNet can be exploited.
To the best of our knowledge, FLDM is the first method to adapt off-the-shelf
image editing methods into video LDMs for video editing. Extensive quantitative
and qualitative experiments demonstrate that FLDM can improve the textual
alignment and temporal consistency of edited videos
On the Importance of Spatial Relations for Few-shot Action Recognition
Deep learning has achieved great success in video recognition, yet still
struggles to recognize novel actions when faced with only a few examples. To
tackle this challenge, few-shot action recognition methods have been proposed
to transfer knowledge from a source dataset to a novel target dataset with only
one or a few labeled videos. However, existing methods mainly focus on modeling
the temporal relations between the query and support videos while ignoring the
spatial relations. In this paper, we find that the spatial misalignment between
objects also occurs in videos, notably more common than the temporal
inconsistency. We are thus motivated to investigate the importance of spatial
relations and propose a more accurate few-shot action recognition method that
leverages both spatial and temporal information. Particularly, a novel Spatial
Alignment Cross Transformer (SA-CT) which learns to re-adjust the spatial
relations and incorporates the temporal information is contributed. Experiments
reveal that, even without using any temporal information, the performance of
SA-CT is comparable to temporal based methods on 3/4 benchmarks. To further
incorporate the temporal information, we propose a simple yet effective
Temporal Mixer module. The Temporal Mixer enhances the video representation and
improves the performance of the full SA-CT model, achieving very competitive
results. In this work, we also exploit large-scale pretrained models for
few-shot action recognition, providing useful insights for this research
direction
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
Leveraging large-scale data can introduce performance gains on many computer
vision tasks. Unfortunately, this does not happen in object detection when
training a single model under multiple datasets together. We observe two main
obstacles: taxonomy difference and bounding box annotation inconsistency, which
introduces domain gaps in different datasets that prevents us from joint
training. In this paper, we show that these two challenges can be effectively
addressed by simply adapting object queries on language embedding of categories
per dataset. We design a detection hub to dynamically adapt queries on category
embedding based on the different distributions of datasets. Unlike previous
methods attempted to learn a joint embedding for all datasets, our adaptation
method can utilize the language embedding as semantic centers for common
categories, while learning the semantic bias towards specific categories
belonging to different datasets to handle annotation differences and make up
the domain gaps. These novel improvements enable us to end-to-end train a
single detector on multiple datasets simultaneously to fully take their
advantages. Further experiments on joint training on multiple datasets
demonstrate the significant performance gains over separate individual
fine-tuned detectors